Diabetes Type II Indian dataset project

Group 1

Members

  • Anna Lifousi (s232979)
  • Jordan Sylvester Fernandes (s222497)
  • Manuel Arcieri (s230158)
  • Quim Bech Vilaseca (s233374)
  • Xavier Viñas Margalef (s233532)

Introduction

  • Diabetes is estimated to affect approximately 530 million adults worldwide, with a global prevalence of 10.5 percent among adults aged 20 to 79 years. 1

  • Type 2 diabetes represents approximately 98 percent of global diabetes diagnoses, although this proportion varies widely among

  • Evaluate the possible factors that affect the appearance of Diabetes Type 2, for further control and prevention.

Materials and Methods

Data Acquisition and Description

Materials and Methods

Data Cleaning, Augmentation and Analysis

  • Data Loading: Automatic data loading from server.
  • Data Cleaning: Several zero values treated as NA. In the case of Insulin, only treated as NA if the patient was diabetic.
  • Data Augmentation: Columns renamed to improve clarity and new categories created for BMI and Age.
  • Data Description & Analysis: Data analyzed and visualized statistically and graphically. Use of Principal Component Analysis to understand correlation between variables.

Results

Correlation heatmap

  • Expected strong correlation between BMI and skin thickness, since skin layer thickness increases with BMI increase
  • Weak correlation between insulin and glucose levels, possibly since insulin levels increase when glucose levels rise in the blood durng the OGTT.

Principal component analysis

  • The data was scaled and centred before performing PCA.

  • There’s no clear separation between the two classes using only the two best PC.

  • The first two principal components only account for around 50% of the total variance.

  • To reach at least 90%, we would have to include 6 PC out of 8.

Discussion

  • We analysed a dataset containing clinical data about 768 patients, some of them suffering from diabetes.

  • While we’ve found some correlation between individual properties, it’s hard to tell whether someone is ill just from a subset of features.

  • We’ve found a linear correlation between blood pressure and skin thickness.

  • PCA couldn’t separate ill individuals from healthy people using only the two best PC.

  • Obesity is a risk factor for diabetes.